Additional Parameters for CSV, JSON, and Parquet file formats

When configuring CSV, JSON, or Parquet file formats apart from parameters like field delimiter and parse header, you can specify additional parameters such as ignoreLeadingWhiteSpace or multiLine and other parsing options. These parameters control how the file is read and interpreted during data ingestion or processing. Additional parameters ensure that the data is correctly structured and aligned with your schema before ingestion.

For example, when you use Field Delimiter and select the correct delimiter, it ensures that the data integration tool correctly splits the data into columns during ingestion. When you use Parse Header, it specifies whether the first line of the CSV file should be treated as a header row containing column names. Setting this correctly ensures that header values are not loaded as data.

CSV

Using additional parameters for CSV file format ensures:

  • Data is parsed correctly.

  • Schema mismatches or incorrect column alignment is prevented.

  • Support is provided for data from multiple sources with different formatting conventions.

The following additional parameters are supported for CSV file format.

Parameter Description
comment Specifies the character that identifies comment lines in the file. Any line beginning with this character is ignored during processing. Example: Setting # treats all lines starting with # as comments.
date_format Defines the format of date values in the data files (data loading) or table (data unloading).
emptyValue Sets the string representation of an empty value.
encoding Defines the character encoding used to read the file. Use UTF-8 for most datasets, especially those containing special or non-English characters.
escape Character used to escape special symbols within a field, such as a quote or delimiter. Example: Using \ allows "New \"York\"" to be read correctly.
ignoreLeadingWhiteSpace When enabled, removes extra white spaces at the beginning of each field value. Helps ensure consistent parsing of fields with irregular spacing.
ignoreTrailingWhiteSpace When enabled, removes white spaces at the end of each field value. Useful for avoiding mismatches caused by unintended spaces.
multiLine Allows fields to span multiple lines if enclosed in quotes. Enable this option when your data contains line breaks within text fields.
nullValue Defines the string that should be interpreted as a null or missing value. Example: Setting null or NA ensures those tokens are treated as empty values.
quote Character used to enclose values containing delimiters, quotes, or line breaks. Commonly set to ". Example: "New York, USA" is treated as one value.
time_format Defines the format of time values in the data files (data loading) or table (data unloading).
empty_field_as_null

When loading data, specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (For example, ,,).

When unloading data, this option is used in combination with FIELD_OPTIONALLY_ENCLOSED_BY.

field_delimiter One or more singlebyte or multibyte characters that separate fields in an input file (data loading) or unloaded file (data unloading).
parse_header Boolean that specifies whether to use the first row headers in the data files to determine column names.
record_delimiter One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading).
trim_space Boolean that specifies whether to remove white space from fields.

 

JSON

Using additional parameters for JSON file format ensures:

  • Precise schema discovery during ingestion.

  • Consistent rule application in profiling, validation, and transformation.

  • Reduction in errors caused by incorrect file interpretation (e.g., misaligned columns, missing headers).

The following additional parameters are supported for JSON file format.

Parameter Description
allowSingleQuotes  
allowUnquotedFieldNames  
date_format Defines the format of date values in the data files (data loading) or table (data unloading).
time_format Defines the format of time values in the data files (data loading) or table (data unloading).
multi_line Boolean that specifies whether multiple lines are allowed.
strip_outer_array Boolean that instructs the JSON parser to remove outer brackets (i.e. [ ]).
null_if String used to convert to and from SQL NULL.
encoding

For reading, allows to forcibly set one of standard basic or extended encoding for the JSON files.

For writing, specifies encoding (charset) of saved json files. JSON built-in functions ignore this option.

trim_space Boolean that specifies whether to remove leading and trailing white space from strings.

 

PARQUET

Using additional parameters for PARQUET file format ensures:

  • Improved schema and data type handling.

  • Fine-grained control over data ingestion.

  • Better compatibility over external systems.

The following additional parameters are supported for PARQUET file format.

Parameter Description
Binary_as_text Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE, Snowflake interprets these columns as binary data.
Trim_space Boolean that specifies whether to remove leading and trailing white space from strings.
Use_vectorized_scanner Boolean that specifies whether to use a vectorized scanner for loading Parquet files.

 

Related Topics Link IconRecommended Topics What's next? Ingesting Data from Amazon S3 into a Snowflake Data Lake